In This paper we present a novel approach to spam filtering and demonstrateits applicability with respect to SMS messages. Our approach requires minimumfeatures engineering and a small set of la- belled data samples. Features areextracted using topic modelling based on latent Dirichlet allocation, and thena comprehensive data model is created using a Stacked Denoising Autoencoder(SDA). Topic modelling summarises the data providing ease of use and highinterpretability by visualising the topics using word clouds. Given that theSMS messages can be regarded as either spam (unwanted) or ham (wanted), the SDAis able to model the messages and accurately discriminate between the twoclasses without the need for a pre-labelled training set. The results arecompared against the state-of-the-art spam detection algorithms with ourproposed approach achieving over 97% accuracy which compares favourably to thebest reported algorithms presented in the literature.
展开▼